Evaluation: Refactor cron #594

AkhileshNegi · 2026-02-08T18:05:51Z

Summary

Target issue is #493

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Restructured evaluation processing to aggregate pending evaluations by project
Updated cron endpoint response format to display per-run details instead of organization-level summaries
Improved error handling and failure tracking per project

coderabbitai · 2026-02-08T18:05:59Z

📝 Walkthrough

Walkthrough

The evaluation cron processing system is refactored from organization-centric grouping to project-centric grouping. The polling function signature simplifies by removing the organization ID parameter, processing all pending evaluation runs across all organizations and grouping them by project. Response structures are updated to reflect per-run details instead of per-organization summaries.

Changes

Cohort / File(s)	Summary
Cron Route Endpoint `backend/app/api/routes/cron.py`	Removed auth-related imports (AuthContextDep, User), updated docstring to reflect project-based processing, and modified response structure to remove organizations_processed field while retaining total_processed, total_failed, and total_still_processing.
Cron Processing Logic `backend/app/crud/evaluations/cron.py`, `backend/app/crud/evaluations/processing.py`	Refactored from per-organization looping to unified delegation model. poll_all_pending_evaluations signature changed from (session, org_id) to (session). Added project-based grouping with per-project OpenAI/Langfuse client initialization. Introduced synchronous wrapper function process_all_pending_evaluations_sync. Updated error handling to be project-scoped rather than organization-scoped.
Test Updates `backend/app/tests/api/routes/test_cron.py`, `backend/app/tests/crud/evaluations/test_processing.py`	Updated cron endpoint tests to expect run-level fields (run_id, run_name, action) instead of org-level fields. Removed org_id argument from poll_all_pending_evaluations test calls. Updated assertions to validate total_processed, total_failed, and total_still_processing without organization-specific expectations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CronRoute as Cron Route<br/>(cron.py)
    participant ProcessCron as Process Cron<br/>(cron.py)
    participant PollFunc as Poll Function<br/>(processing.py)
    participant Database as Database<br/>Session
    participant ProjectGroup as Project<br/>Grouping

    Client->>CronRoute: GET /cron/evaluation_jobs
    CronRoute->>ProcessCron: process_all_pending_evaluations_sync(session)
    ProcessCron->>PollFunc: await poll_all_pending_evaluations(session)
    PollFunc->>Database: Fetch all pending evaluation runs
    Database-->>PollFunc: List of pending runs
    PollFunc->>ProjectGroup: Group runs by project_id
    ProjectGroup-->>PollFunc: Dict[project_id, List[runs]]
    loop For each project
        PollFunc->>PollFunc: Initialize OpenAI/Langfuse<br/>clients per project
        PollFunc->>PollFunc: Process all runs<br/>in project
        PollFunc->>Database: Update run statuses/results
    end
    PollFunc-->>ProcessCron: Summary dict<br/>(total_processed, total_failed,<br/>total_still_processing, details)
    ProcessCron-->>CronRoute: Processed response
    CronRoute-->>Client: Response with run-level<br/>details (no org grouping)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Kaapi v1.0: Enhancing the test suite #488: Extensive test additions and updates for evaluation processing and cron endpoint that directly align with the signature and behavior changes in poll_all_pending_evaluations and process_all_pending_evaluations functions.

Suggested labels

enhancement, ready-for-review

Suggested reviewers

Prajna1999

Poem

Hop along with projects now so bright, 🐰✨
No more orgs to group in sight,
Every run polled far and wide,
Grouped by project, full of pride!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Evaluation: Refactor cron' is vague and lacks specificity about what refactoring was performed or why it matters; it uses generic terminology without conveying the actual change.	Consider a more descriptive title that explains the key change, such as 'Evaluation: Refactor cron to group processing by project instead of organization' or 'Evaluation: Consolidate evaluation polling into single query'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch enhancement/evaluation-refactor-cron

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-08T18:10:44Z

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/evaluations/cron.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/app/crud/evaluations/processing.py (1)
750-755: ⚠️ Potential issue | 🟡 Minor

"embeddings_completed" and "embeddings_failed" are counted as still_processing.

check_and_process_evaluation can return actions "embeddings_completed" and "embeddings_failed" (lines 515, 536), both of which leave the eval run in "completed" status. Because they don't match "processed" or "failed", they fall into the else branch and increment total_still_processing_count, which is misleading in the summary.
Proposed fix
                     if result["action"] == "processed":
                         total_processed_count += 1
-                    elif result["action"] == "failed":
+                    elif result["action"] in ("failed", "embeddings_failed"):
                         total_failed_count += 1
+                    elif result["action"] == "embeddings_completed":
+                        total_processed_count += 1
                     else:
                         total_still_processing_count += 1
backend/app/api/routes/cron.py (1)
54-60: ⚠️ Potential issue | 🟡 Minor

Error response is missing "results" key, inconsistent with the success shape.

The success path (line 47) returns a dict that always contains "results" (set by process_all_pending_evaluations in cron.py). The route-level error handler here omits it, which may break callers expecting a uniform schema.
Proposed fix
         return {
             "status": "error",
             "error": str(e),
             "total_processed": 0,
             "total_failed": 0,
             "total_still_processing": 0,
+            "results": [],
         }

🧹 Nitpick comments (3)

backend/app/crud/evaluations/processing.py (1)

699-701: Deriving org_id from the first run is safe given the data model, but fragile if the assumption changes.

org_id = project_runs[0].organization_id relies on all runs sharing the same org for a given project. This holds because Project belongs to a single Organization, but consider adding a brief comment explaining why this is safe (the FK relationship), so future readers don't question it.
backend/app/crud/evaluations/cron.py (1)
66-78: asyncio.run() may conflict with an existing event loop.

asyncio.run() creates a new event loop and fails with RuntimeError if one is already running. While FastAPI runs sync endpoints in a threadpool (no active loop there), this is fragile — if the endpoint is ever changed to async def, or if this wrapper is called from any async context, it will break. Consider making the route handler async def and awaiting process_all_pending_evaluations directly.
Alternative: make the endpoint async and drop the sync wrapper

In backend/app/api/routes/cron.py:
-def evaluation_cron_job(
+async def evaluation_cron_job(
     session: SessionDep,
 ) -> dict:
-        result = process_all_pending_evaluations_sync(session=session)
+        result = await process_all_pending_evaluations(session=session)
Then this sync wrapper can be removed entirely.
backend/app/api/routes/cron.py (1)
19-21: Return type hint could be more specific.

Per coding guidelines, all return values should have type hints. -> dict could be -> dict[str, Any] for consistency with the rest of the codebase. As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".
Proposed fix
-def evaluation_cron_job(
-    session: SessionDep,
-) -> dict:
+def evaluation_cron_job(
+    session: SessionDep,
+) -> dict[str, Any]:
You'll also need to add from typing import Any to the imports.

refactor cron to single query

2eab20e

AkhileshNegi marked this pull request as ready for review February 9, 2026 04:09

AkhileshNegi linked an issue Feb 9, 2026 that may be closed by this pull request

Evaluation: Refactoring CRON #493

Closed

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

AkhileshNegi requested a review from vprashrex February 9, 2026 04:23

AkhileshNegi self-assigned this Feb 9, 2026

AkhileshNegi added the enhancement New feature or request label Feb 9, 2026

AkhileshNegi requested a review from Prajna1999 February 9, 2026 05:02

Prajna1999 approved these changes Feb 9, 2026

View reviewed changes

AkhileshNegi merged commit 790048a into main Feb 9, 2026
2 of 3 checks passed

AkhileshNegi deleted the enhancement/evaluation-refactor-cron branch February 9, 2026 05:19

AkhileshNegi mentioned this pull request Feb 9, 2026

Evaluation: STT #571

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Refactor cron #594

Evaluation: Refactor cron #594

AkhileshNegi commented Feb 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Evaluation: Refactor cron #594

Evaluation: Refactor cron #594

Conversation

AkhileshNegi commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 8, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AkhileshNegi commented Feb 8, 2026 •

edited

Loading

coderabbitai bot commented Feb 8, 2026 •

edited

Loading